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Executive Summary 



The Progress in International Reading Literacy Study 
(PIRLS) is an assessment of the reading comprehension 
of students in their fourth year of schooling. In 2006, 
PIRLS was administered to a nationally representative 
sample of fourth-grade students in the United States, 
as well as to students in 44 other jurisdictions around 
the world. ^ The PIRLS assessment measures student 
performance on a combined reading literacy scale and 
on a literary subscale and informational subscale. The 
literary subscale assessed performance in reading for 
literary experience and the informational subscale in 
acguiring and using information. 

This report compares the performance of U.S. students 
with their peers around the world and also examines how 
the reading literacy of U.S. fourth-grade students has 
changed since the first administration of PIRLS in 2001. 
Results are presented by student background character- 
istics (sex and race/ethnicity) and by contextual factors 
that may be associated with reading proficiency (school 
characteristics, instructional practices and teacher prep- 
aration, and the home environment for reading). 

On the combined reading literacy scale in 2006, 

• Average scores for U.S. students (540) were higher 
than the scores for students in 22 jurisdictions; 

• Average scores for U.S. students were lower than 
the scores for students in 10 jurisdictions; 

• There were no measurable differences between 
average scores for U.S. students and the scores 
for students in 12 jurisdictions; 



‘The assessment is open to countries and subnational entities. 
In this report, participating countries and subnational enti- 
ties are both referred to as "jurisdictions." 



• The percentage of U.S. students at or above each 
of the four international benchmarks was higher 
than the international median percentage 

(96 versus 94 for the low international benchmark, 
82 versus 76 for the intermediate international 
benchmark, 47 versus 41 for the high international 
benchmark, and 12 versus 7 for the advanced 
international benchmark); 

• Average scores for girls were higher than 
average scores for boys in the United States (545 
versus 535) and in all jurisdictions, with the 
exception of two jurisdictions, where there were no 
measurable differences between the sexes; and 

• Average scores for White, non-Hispanic (560); 

Asian, non-Hispanic (567); and non-Hispanic stu- 
dents in the racial groups classified as other (573) 
(see appendix B for race/ethnicity classification) 
in the United States were higher than the scores 
for Black, non-Hispanic (503); Hispanic (518); and 
American Indian/Alaska Native, non-Hispanic 
students (468) in the United States. 

Between 2001 and 2006, 

• There were no measurable differences in average scores 
for U.S. students on the combined reading literacy 
scale or on the literary or informational subscales; 

• Average scores on the combined reading literacy 
scale increased for students in 8 jurisdictions, 
decreased for students in 6 jurisdictions, and 
did not measurably differ for students in 14 
jurisdictions; and 

• The average number of years of experience for U.S. 
teachers of fourth-grade students decreased from 
15 to 12 years. 
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Introduction 



The Progress in International Reading Literacy Study 
(PIRLS) is a continuing assessment of the reading com- 
prehension of students in their fourth year of schooling 
in jurisdictions around the world. PIRLS not only helps 
participating jurisdictions understand the literacy skills 
of their students but also places the literacy of young 
readers within an international context. Drawing com- 
parisons between jurisdictions reveals areas of strengths 
as well as areas in need of improvement, offering juris- 
dictions insight into how the reading literacy of their 
students may be enhanced. 

PIRLS is conducted by the International Association for 
the Evaluation of Educational Achievement (lEA), with 
national sponsors in each participating jurisdiction. In 
the United States, PIRLS is sponsored by the National 
Center for Education Statistics (NCES), in the Institute of 
Education Sciences in the U.S. Department of Education. 

PIRLS 2006 was the second cycle of the study, which 
was first administered in 2001. The assessment is open 
to countries and subnational entities. In this report, 
participating countries and subnational entities are 
both referred to as "jurisdictions." In 2006, forty-five 
jurisdictions, including the United States, participated 
in PIRLS (figure 1). In addition to 38 participating 
countries, this total includes 5 participating Canadian 
provinces and 2 separate samples of students that were 
assessed in Belgium.^ The United States was one of 29 
jurisdictions to participate in both the 2001 and 2006 
administrations of PIRLS. 



^The two major geographic and cultural regions of Belgium 
(Flemish and French) have separate educational systems and 
were each assessed in PIRLS. Throughout the report, Belgium 
(Flemish) and Belgium (French) are reported as separate 
jurisdictions. 



This report summarizes the performance of U.S. fourth- 
grade students on the three separate scales (two 
literacy subscales and the combined scale) that make 



Figure 1. Jurisdictions participating in PIRLS: 


2001 and 2006 




Austria 




Latvia 
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Lithuania 






Belgium (French) 




Luxembourg 
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Hong Kong, SAR^ 




Slovak Republic 






Hungary 
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South Africa 
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Trinidad and Tobago 
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□ Participated in 2006 □ Participated in 2001 and 2006 


only 






’Hong Kong, SAR, is a Special Administrative Region (SAR) of 


the People's Republic of China. 




SOURCE; International Association for the Evaluation of Educa- 


tional Achievement, Progress in International Reading Literacy 


Study (PIRLS), 2001 and 2006 
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up the PIRLS assessment. The analyses presented help 
address three questions: 

• How does the reading literacy of U.S. fourth-grade 
students compare with the reading literacy of 
fourth-grade students internationally? 

• How does the reading literacy of U.S. fourth-grade 
students vary by student background character- 
istics, school and classroom characteristics, and 
home reading environment? 

• How has the reading literacy of U.S. fourth-grade 
students changed since 2001? 

Results and comparisons for all participating jurisdic- 
tions in PIRLS 2006, as well as technical documentation 
for the assessment, are available on the Internet at 
www.pirls.org . 

Defining and measuring reading 
literacy 

PIRLS defines reading literacy as 

the ability to understand and use those written 
language forms reguired by society and/or valued 
by the individual. Young readers can construct 
meaning from a variety of texts. They read to 
learn, to participate in communities of readers 
in school and everyday life, and for enjoyment. 
(Mullis et al. 2006) 

Within this context, the study examines three dimen- 
sions of reading literacy: 

• processes of comprehension;^ 

• purposes of reading; and 

• reading behaviors and attitudes. 



^See Mullis et al. (2007) for results of analyses examining 
processes of comprehension. 



The distribution of PIRLS items across the first two 
dimensions, processes of comprehension and purposes 
of reading, is shown in table 1. Both dimensions were 
measured through the PIRLS assessment items admin- 
istered to each participating student. The third dimen- 
sion, reading behaviors and attitudes, was measured 
through a separate background questionnaire adminis- 
tered to participating students. 

The processes of comprehension dimension describes 
how young readers interpret and make sense of text. 
PIRLS assesses students' abilities to (1) focus on 
and retrieve explicitly stated information, (2) make 
straightforward inferences, (3) interpret and integrate 
ideas and information, and (4) examine and evaluate 
content, language, and textual elements. 

The purposes of reading dimension describes the two 
main reasons why young students read printed materi- 
als: (1) for literary experience and (2) to acquire and 
use information. Fictional texts are used to measure the 
ability of students to read for literary experience, and 
nonfictional texts are used to measure their skills at 
acquiring and using information. 

Results from the PIRLS assessment are reported on 
subscales that measure the two types of purposes of 

Table 1. Distribution of PIRLS items 
measuring processes of 
comprehension and purposes of 



reading: 2006 

Classification of items Number of items 

Processes of comprehension 

Total 126 

Focus on and retrieve explicitly stated 
information 31 

Make straightforward inferences 43 

Interpret and integrate ideas and information 34 
Examine and evaluate content, language, 
and textual elements 18 

Purposes of reading 

Total 126 

Literary experience 64 

Acquire and use information 62 



SOURCE: International Association for the Evaluation of Educa- 
tional Achievement, Progress in International Reading Literacy 
Study (PIRLS), 2006. 
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reading: reading for Literary experience and reading to 
acquire and use information. Additionally, results are 
reported on a combined reading Literacy scale, which 
captures students' overall literacy skills related to both 
processes of comprehension and purposes of reading. 
This report emphasizes results from the combined read- 
ing literacy scale because the scale summarizes student 
performance on the two cognitive dimensions of read- 
ing literacy in a single measure.'^ 

The texts for the PIRLS assessment were submitted from 
the participating jurisdictions and reflect the kinds of 
printed materials read by children in those jurisdic- 
tions. All participating jurisdictions used the same 
texts. The passages were reviewed by the PIRLS Reading 
Development Group, an international advisory panel 
that selected texts for the assessment that reflected the 
jurisdictions and cultures participating in PIRLS. 

Design and administration of 
PIRLS 2006 

PIRLS consists of two main components: (1) a literacy 
assessment administered to sampled fourth-grade stu- 
dents and (2) background questionnaires administered 
to students, their teachers, and the administrators 
in the schools in which the sampled students were 
enrolled.^ Procedures for sampling students and admin- 
istering the study were established by the lEA and 
then implemented in each participating jurisdiction. 
In the United States, the PIRLS sample was designed 
to be representative of all fourth-grade students in the 
50 states and the District of Columbia. Quality control 
monitors trained by the lEA visited schools in each 
jurisdiction to ensure that the procedures specified by 
the lEA were implemented properly. 



“See appendix B for more information about the items 
comprising the PIRLS scales. 

*All jurisdictions other than the United States also adminis- 
tered a background questionnaire to students' parents or legal 
guardians. 



The U.S. sample consisted of 222 schools, of which 214 
were eligible (8 schools had closed and were designated 
as ineligible). One hundred and twenty of the original 
sample schools participated, for a weighted response 
rate of 57 percent.^ An additional 63 replacement schools 
also participated, for a total of 183 schools, or an 86 
percent weighted school response rate.^ Information 
about the size of each fourth-grade class was collected 
from participating schools, and a random sample of 
one or two classes from each school was selected. All 
students from selected classrooms were asked to partici- 
pate. Of the 256 classrooms sampled, 255 participated, 
or 99 percent. Within these classrooms, 5,442 students 
were eligible and 5,190 completed the assessment for a 
weighted student response rate of 95 percent. 

A total of 10 reading passages, 4 from PIRLS 2001 and 
6 developed for the 2006 administration, were included 
in the assessment booklets used in all participating 
jurisdictions. The use of common passages in the 2001 
and 2006 assessments allows the analysis of changes 
in reading literacy over the 5-year period between 
administrations for jurisdictions that participated in 
both cycles. The passages, as well as all other study 
materials, were translated into the primary Language or 
languages of instruction in each jurisdiction. 

Students who participated in the assessment received a 
test booklet containing two passages and were asked to 
answer a series of multiple-choice and open-ended ques- 
tions related to the passages. Student responses were 
scored in each jurisdiction following standardized scoring 
procedures outlined and monitored by the lEA. Sample 
responses to one of the reading passages included in the 
2006 assessment are shown in appendix A. 

Further information about the design and administra- 
tion of PIRLS is provided in appendix B. 



‘’All weighted response rates discussed in this report refer to 
final adjusted weights. 

'Response rates are calculated using the formulas developed 
by the lEA for PIRLS. The standard NCES formula would result 
in a lower school response rate of approximately 63 percent. 
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Reporting student results on PIRLS 

Results from PIRLS are reported in two ways: (1) as 
average scale scores and (2) as the percentage of stu- 
dents reaching each of the PIRLS international bench- 
mark levels. 

Average scale scores 

PIRLS scores are reported on a scale from 0-1000 with 
the scale average fixed at 500 and a standard deviation 
of 100. The PIRLS scale average was set in 2001 and 
reflects the combined proficiency distribution of all stu- 
dents in all jurisdictions participating in 2001. To allow 
comparisons between 2001 and 2006, scores of students 
in jurisdictions that participated in both 2001 and 2006 
(29 jurisdictions) were used to scale the 2006 results. 
The 2006 scores were linked to the 2001 scale using com- 
mon items on both assessments. Once scores from the 
2006 assessment were scaled to the 2001 scale, scores 
of students in jurisdictions that participated in 2006 but 
not in 2001 were placed on the PIRLS scale. 



PIRLS international benchmarks 

The PIRLS international benchmarks provide a way to 
interpret scale scores and to understand how students' 
proficiency varies along the PIRLS scale. In 2001, the 
cutpoints for the PIRLS benchmarks were set on the 
basis of the distribution of students along the PIRLS 
scale (the top 10 percent, the upper quartile, the 
median, and the lower quartile). In 2006, the cutpoints 
were revised to be identical to the cutpoints used for 
the Trends in International Mathematics and Science 
Study (TIMSS), which is also conducted by the lEA. 
Information about the rationale underlying the bench- 
marks and the procedures used to set the cutpoints is 
available in Martin et al. (2007). Figure 2 describes 
the international benchmarks introduced for the 2006 
assessment. 

The skills and strategies associated with each level were 
developed by the PIRLS Reading Development Group, 
which reviewed a sample of student responses to the 
assessment items. Each international benchmark describes 
the reading skills and strategies associated with specific 



Figure 2. Description of PIRLS international benchmarks: 2006 


Benchmark 


Outpoint 


Reading skills and strategies 


Advanced 


625 


• Interpret figurative language 

• Distinguish and interpret complex information from different parts of text 

• Integrate ideas across text to provide interpretations about characters' feelings 
and behaviors 


High 


550 


• Recognize some textual features, such as figurative language and abstract messages 

• Make inferences on the basis of abstract or embedded information 

• Integrate information to recognize main ideas and provide explanations 


Intermediate 


475 


• Identify central events, plot sequences, and relevant story details 

• Make straightforward inferences from the text 

• Begin to make connections across parts of the text 


Low 


400 


• Retrieve explicitly stated details from literary and informational texts 


NOTE: Information about the procedures used to set the international benchmarks is available in the PIRLS 2006 Technical Report 
(Martin, MuLlis, and Kennedy 2007). 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study (PIRLS), 2006. 
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scores on the combined reading Literacy scale. For example, 
students with scores equal to or greater than 400 on the 
combined reading literacy scale met the low international 
benchmark. This means that these students could retrieve 
explicitly stated details from literary and informational 
texts. Students who scored at or above the outpoint for 
the next benchmark (intermediate, at 475) could accom- 
plish the reading skills and strategies associated with the 
low benchmark, as well as the reading skills and strategies 
associated with the intermediate benchmark. 

Organization of the report 

This report is divided into five sections. Following this 
introduction, the next section compares the reading 
literacy of U.S. fourth-grade students with the literacy 
of their peers internationally and also examines changes 



in literacy between 2001 and 2006. The third section 
on student background characteristics explores differ- 
ences among U.S. students by sex and race/ethnicity. 
The fourth section compares the reading literacy of U.S. 
fourth-grade students on the basis of school characteris- 
tics. The final section examines the relationship between 
literacy and the home environment for reading. 

All differences between or among groups discussed 
in this report are statistically significant at the .05 
level of statistical significance. Information about the 
tests conducted to determine statistical significance is 
provided in appendix B. Supplementary tables show- 
ing all estimates and standard errors discussed in this 
report are available at http://nces.ed.gov/pubsearch/ 
pubsinfo.asp?pubid=2008017 . More information about 
U.S. participation in PIRLS is available at the NCES 
website at http://nces.ed.qov/survevs/pirls . 
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Reading Literacy in the United States and 
Internationally 



Results from PIRLS 2006 reveal how the reading literacy 
of U.S. fourth-grade students compares with the read- 
ing literacy of students internationally, as well as how 
reading literacy has changed since the first administra- 
tion of PIRLS in 2001. In addition to reporting average 
scores on the combined reading literacy scale and the 
literary and informational subscales, results for 2006 
are shown by each of the four PIRLS international 
benchmarks. 

Average scores in 2006 

The average score for U.S. fourth-grade students on the 
combined reading literacy scale (540) was higher than 
the PIRLS scale average (500) and also higher than the 
average scores for students in 22 of the 45 participat- 
ing PIRLS jurisdictions (figure 3). The U.S. average 
was lower than the average score in 10 jurisdictions. 
There were no measurable differences between the U.S. 
average and the average scores in the 12 remaining 
jurisdictions. 

On the literary subscale, the U.S. average (541) was 
higher than the PIRLS scale average (500). The U.S. 
average on the informational subscale (537) was also 
higher than the PIRLS scale average (500). On the lit- 
erary subscale, U.S. students outperformed students in 
23 jurisdictions. Students in 9 jurisdictions had higher 



average scores on the literary subscale than students in 
the United States. 

On the informational subscale, the U.S. average was 
higher than the average in 21 jurisdictions and lower 
than the average in 12 jurisdictions. 

Changes between 2001 and 2006 

As shown in table 2, average scores for U.S. fourth- 
grade students on the combined reading literacy scale 
did not measurably differ between 2001 and 2006. 
Average scores for the literary and informational sub- 
scales in 2006 also did not measurably differ from the 
average scores in 2001. 

Of the 29 jurisdictions that participated in PIRLS in 
both 2001 and 2006, 8 (Germany; Hong Kong, SAR; 
Hungary; Italy; the Russian Federation; Singapore; the 
Slovak Republic; and Slovenia) saw increases in their 
average combined reading literacy scores.® Average 
scores on the combined reading literacy scale declined 
from 2001 to 2006 in England, Lithuania, Morocco, the 
Netherlands, Romania, and Sweden. 



^Although Kuwait participated in 2001 and 2006, the lEA 
elected not to report the 2001 estimates for the country 
because of concerns about the quality of Kuwait's data. 
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Figure 3. Average scores for fourth-grade students in participating PIRLS jurisdictions on combined 


reading literacy scale, literary subscale, and informational subscale, by jurisdiction: 2006 




Average 




Average 




Average 


combined 




literary 


informational 




reading 




subscale 




subscale 


Jurisdiction literacy score 


Jurisdiction 


score 


Jurisdiction 


score 




Russian Federation 


565 




Canada, Alberta 


561 




Hong Kong, SAR' 


568 


□ Average is higher 




Hong Kong, SAR' 


564 




Russian Federation 


561 




Russian Federation 


564 


than the U.S. average 




Canada, Alberta 


560 




Canada, British Columbia 559 




Singapore 


563 






Canada, British Columbia 558 




Hong Kong, SAR' 


557 




Luxembourg 


557 


□ Average is not 




Singapore 


558 




Hungary 


557 




Canada, Alberta 


556 






Luxembourg 


557 




Canada, Ontario 


555 




Canada, British Columbia 554 






Canada, Ontario 


555 




Luxembourg 


555 




Canada, Ontario 


552 


□ Average is lower than 




Hungary 


551 




Singapore 


552 




Bulgaria 


550 


the U.S. average 




Italy 


551 




Italy 


551 




Italy 


549 






Sweden 


549 


Germany 


549 


Sweden 


549 




Germany 


548 


Denmark 


547 


Netherlands' 


548 




Belgium (Flemish)^ 


547 


Sweden 


546 


Belgium (Flemish)' 


547 




Bulgaria 


547 


Netherlands' 


545 


Germany 


544 


Netherlands^ 


547 


Belgium (Flemish)' 


544 


Denmark 


542 


Denmark 


546 


Canada, Nova Scotia 


543 


Hungary 


541 


Canada, Nova Scotia 


542 


Bulgaria 


542 


Latvia 


540 


Latvia 


541 


Lithuania 


542 


Canada, Nova Scotia 


539 


United States^ 


540 


United States' 


541 


Chinese Taipei 


538 


England 


539 


England 


539 


England 


537 


Austria 


538 


Latvia 


539 


United States' 


537 


Lithuania 


537 


Austria 


537 


Austria 


536 


Chinese Taipei 


535 


Slovak Republic 


533 


New Zealand 


534 


Canada, Quebec 


533 


Chinese Taipei 


530 


Canada, Quebec 


533 




New Zealand 


532 




Canada, Quebec 


529 


Lithuania 


530 




Slovak Republic 


531 




New Zealand 


527 




Scotland' 


527 






Scotland^ 


527 




Scotland' 


527 




Slovak Republic 


527 






France 


522 




Poland 


523 




France 


526 






Slovenia 


522 




Slovenia 


519 




Slovenia 


523 






Poland 


519 




France 


516 




Poland 


515 






Spain 


513 




Israel 


516 




Moldova 


508 






Israel 


512 




Spain 


516 




Spain 


508 






Iceland 


511 




Iceland 


514 




Israel 


507 






Belgium (French) 


500 




Norway' 


501 




Iceland 


505 






Moldova 


500 




Belgium (French) 


499 




Belgium (French) 


498 






Norway' 


498 




Romania 


493 




Norway' 


494 






Romania 


489 




Moldova 


492 




Romania 


487 






Georgia 


471 




Georgia 


476 




Georgia 


465 






Macedonia 


442 




Macedonia 


439 




Macedonia 


450 






Trinidad and Tobago 


436 




Trinidad and Tobago 


434 




Trinidad and Tobago 


440 






Iran 


421 




Iran 


426 




Iran 


420 






Indonesia 


405 




Indonesia 


397 




Indonesia 


418 






Qatar 


353 




Qatar 


358 




Qatar 


356 






Kuwait 


330 




Kuwait 


340 




Morocco 


335 






Morocco 


323 




Morocco 


317 




Kuwait 


327 






South Africa 


302 




South Africa 


299 




South Africa 


316 






PIRLS scale averaBe 


500 




PIRLS scale averaBe 


500 




PIRLS scale averaBe 


500 





'Hong Kong, SAR, 1s a Special Administrative Region (SAR) of the People's Republic of China. 

^Met guidelines for sample participation rates only after replacement schools were included. See appendix B for more information about 
participation rates and the use of replacement schools in sampling. 

^Did not meet guidelines for sample participation rates after replacement schools were included. See appendix B for more information 
about participation rates and the use of replacement schools in sampling. 

NOTE: Jurisdictions are ordered on the basis of average scores, from highest to lowest. Score differences as noted between the United 
States and other jurisdictions are statistically significant at the .05 level of statistical significance (p < .05). 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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Table 2. Average scores for fourth-grade students in participating PIRLS jurisdictions on combined 
reading literacy scale, literary subscale, and informational subscale, by jurisdiction: 2001 
and 2006 



Jurisdiction 


Average combined 
readinq literacv score 


Average literary 
subscale score 


Average informational 
subscale score 


2001 


2006 


2001 


2006 


2001 


2006 


Bulgaria 


550 


547 


550 


542 


551 


550 


Canada, Ontario 


548 


554 


551 


554 


542 


551* 


Canada, Quebec 


537 


533 


534 


529 


541 


533 


England 


553 


539* 


559 


539* 


546 


537* 


France 


525 


522 


518 


516 


533 


526* 


Germany 


539 


548* 


537 


549* 


538 


544* 


Hong Kong, SAR' 


528 


564* 


518 


557* 


537 


568* 


Hungary 


543 


551* 


548 


557* 


537 


541 


Iceland 


512 


511 


520 


514* 


504 


505 


Iran 


414 


421 


421 


426 


408 


420* 


Israel 


509 


512 


510 


516 


507 


507 


Italy 


541 


551* 


543 


551* 


536 


549* 


Latvia 


545 


541 


537 


539 


547 


540* 


Lithuania 


543 


537* 


546 


542 


540 


530* 


Macedonia 


442 


442 


441 


439 


445 


450 


Moldova 


492 


500 


480 


492* 


505 


508 


Morocco 


350 


323* 


347 


317* 


358 


335 


Netherlands^ 


554 


547* 


552 


545* 


553 


548 


New Zealand 


529 


532 


531 


527 


525 


534 


Norway 3 


499 


498 


506 


501 


492 


494 


Romania 


512 


489* 


512 


493* 


512 


487* 


Russian Federation 


528 


565* 


523 


561* 


531 


564* 


Scotland^ 


528 


527 


529 


527 


527 


527 


Singapore 


528 


558* 


528 


552* 


527 


563* 


Slovak Republic 


518 


531* 


512 


533* 


522 


527 


Slovenia 


502 


522* 


499 


519* 


503 


523* 


Sweden 


561 


549* 


559 


546* 


559 


549* 


United States^ 


542 


540 


550 


541 


533 


537 



*p < .05. Significantly different from 2001 average at the .05 Level of statistical significance. 

'Hong Kong, SAR, is a Special Administrative Region (SAR) of the People's Republic of China. 

'Met guidelines for sample participation rates in 2006 only after replacement schools were included. See appendix B for more information 
about participation rates and the use of replacement schools in sampling. 

' Did not meet guidelines for sample participation rates in 2006 after replacement schools were included. See appendix B for more 
information about participation rates and the use of replacement schools in sampling. 

NOTE: The 2001 and 2006 estimates for Canada, Ontario shown in this table exclude private schools because only public schools were 
included in the jurisdiction's 2001 sampling frame. Although Kuwait participated in 2001 and 2006, the lEA elected not to report the 
2001 estimates for the country because of concerns about the quality of Kuwait's data. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2001 and 2006. 
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Results From the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS) 



Reading literacy by international 
benchmarks 

Figure 4 shows the percentage of U.S. fourth-grade 
students reaching each of the PIRLS international 
benchmarks, as well as the international median per- 
centage (the international median percentage includes 
the United States) of students reaching each bench- 
mark. For the international median at each benchmark, 
half of the PIRLS jurisdictions have that percentage of 
students at or above the median and half have that 
percentage of students below the median. For example, 
the low international benchmark median of 94 percent 
indicates that half of the jurisdictions have 94 percent 



or more of their students who met the low benchmark 
and half have less than 94 percent of their students 
who met the low benchmark. 

For each of the four international benchmarks, the per- 
centage of U.S. students who reached the benchmark 
was higher than the international median percentage. 
Ninety-six percent of U.S. fourth-grade students met 
the low international benchmark, indicating that they 
had scores on the combined reading literacy scale equal 
to or greater than 400. Twelve percent of U.S. students 
reached the advanced benchmark, with scores equal to 
or greater than 625 (see figure 2 for the outpoint for 
each benchmark). 



Figure 4. Percentage of fourth-grade students in United States and international median who reach 
PIRLS international benchmarks: 2006 



Percent 




I United States Q International median 



*p < .05. Significantly different from international median percentage at the .05 level of statistical significance. 

NOTE: The United States met guidelines for sample participation rates after replacement schools were included. See appendix B for more 
information about participation rates and the use of replacement schools in sampling. The international median represents all participat- 
ing PIRLS jurisdictions, including the United States. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 



9 




The Reading Literacy of U.S. Fourth-Grade Students in an International Context 



Reading Literacy and Student Background 
Characteristics 



To examine how reading Literacy varies across students, 
PIRLS collects information on student background char- 
acteristics. Because many background characteristics 
are unique to each jurisdiction, comparisons between 
students in the United States and students interna- 
tionally are discussed only for sex in this section. In 
addition to sex, information about student race and 
ethnicity was obtained in the United States and is also 
discussed in this section. 

Sex 

In 2006, in all but two jurisdictions (Luxembourg and 
Spain), average scores for girls on the combined read- 
ing literacy scale were higher than average scores for 
boys (figure 5). In the United States, girls on average 
scored 10 points higher than boys (545 versus 535).^ 
Internationally, the average score for girls was 17 
points higher than the average score for boys. 



’’The effect size for the difference between girls and boys on 
the combined reading literacy scale was .14. See appendix B 
for a discussion of effect sizes. 



Average scores for girls were also higher than average 
scores for boys on the literary subscale in all jurisdic- 
tions with the exception of Iran. In all but five jurisdic- 
tions (Belgium (French), Flungary, Italy, Luxembourg, 
and Spain), girls had higher scores than boys on the 
informational subscale. In the United States, average 
scores for girls were 12 points higher than average 
scores for boys on the literary scale (547 versus 534) 
and 9 points higher on the informational subscale (542 
versus 532). 

Average scores for U.S. girls (545) and U.S. boys (535) 
on the combined reading literacy scale were higher 
than the international averages for girls (509) and boys 
(492). In addition, the average score for U.S. fourth- 
grade girls on the combined reading literacy scale was 
higher than the scores for girls in 20 jurisdictions. Girls 
in 10 jurisdictions had average scores higher than the 
average score for U.S. girls on the combined reading 
literacy scale. 

The average score for U.S. boys on the combined read- 
ing literacy scale was higher than the average score for 
boys in 21 jurisdictions, and boys in 9 jurisdictions had 
average scores higher than the U.S. average. 
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Figure 5. Difference in average scores between fourth-grade boys and girls in participating PIRLS 
jurisdictions on combined reading literacy scale, by jurisdiction: 2006 



Jurisdiction 

Kuwait 
Qatar 
South Africa 
Trinidad and Tobago 
New Zealand 
Latvia 
Scotland’ 

Bulgaria 
Canada, Nova Scotia 
Macedonia 
Indonesia 
England 
Iceland 
Norway^ 

Slovenia 

Lithuania 

Morocco 

Sweden 

Georgia 

Poland 

Singapore 

Israel 

Russian Federation 
Denmark 
Iran 
Moldova 
Romania 
Canada, Ontario 
Canada, Quebec 
Chinese Taipei 
France 
Slovak Republic 
Austria 
Hong Kong, SAR^ 

United States' 

Canada, British Columbia 
Canada, Alberta 
Germany 
Italy 
Netherlands' 

Belgium (Flemish)' 

Belgium (French) 

Hungary 

Spain'* 

Luxembourg'* 

International average 

70 60 50 40 30 



^^■ 37 * 

^^ 36 * 

^ 31 * 

124* 

23* 




^™il* 

^Hll* 

^™io* 

^"10* 

^™io* 

H8* 

™7* 

h7* 

"7* 

^^^17* 

10 0 10 20 30 40 50 60 70 



Boys score higher Girls score higher 

Average score difference 



*p < .05. Average score for girls is significantly different from the average score for boys at the .05 level of statistical significance. 

'Met guidelines for sample participation rates only after replacement schools were included. See appendix B for more information about 
participation rates and the use of replacement schools in sampling. 

'Did not meet guidelines for sample participation rates after replacement schools were included. See appendix B for more information 
about participation rates and the use of replacement schools in sampling. 

'Hong Kong, SAR, is a Special Administrative Region (SAR) of the People's Republic of China. 

'‘Difference in average scores between boys and girls is not statistically significant. 

NOTE: Jurisdictions are ordered on the basis of score differences between boys and girls, from largest to smallest difference. Differences 
were computed using unrounded numbers. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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Race/ethnicity 

In 2006, average scores for U.S. students on the 
combined reading literacy scale and the two literacy 
subscales measurably differed on the basis of the 
race and ethnicity of students (table 3). On the com- 
bined reading literacy scale, average scores for White, 
non-Hispanic (560); Asian, non-Hispanic (567); and 
non-Hispanic students in the racial groups classified 
as other (573) (see appendix B for race/ethnicity 
classification) were higher than the scores for Black, 
non-Hispanic (503); Hispanic (518); and American 



Indian/Alaska Native, non-Hispanic students (468).^“ 
For non-Hispanic students, there were no measurable 
differences in average scores on the combined reading 
literacy scale among students in the White, Asian, and 
other groups. Hispanic students had higher average 
scores than Black, non-Hispanic students and American 
Indian/Alaska Native, non-Hispanic students. Average 
scores for Black students were lower than the scores for 
all other non-Hispanic groups, with the exception of 
American Indian/Alaska Native students. 



“The effect size for the difference between White, non- 
Hispanic students and Black, non-Hispanic students was .83. 
The effect size between White, non-Hispanic students and 
Hispanic students was .61. See appendix B for a discussion 
of effect sizes. 



Table 3. Average scores for U.S. fourth-grade students on combined reading literacy scale, literary 



subscale, and informational subscale, by race/ethnicity: 2006 



Scale and race/ethnidty' 


2006 


Combined reading literacy scale 


White, non-Hispanic 


560 


Black, non-Hispanic 


503 


Hispanic 


518 


Asian, non-Hispanic 


567 


American Indian/Alaska Native, non-Hispanic 


468 


Other, non-Hispanic 


573 


Literary subscale 


White, non-Hispanic 


562 


Black, non-Hispanic 


501 


Hispanic 


517 


Asian, non-Hispanic 


569 


American Indian/Alaska Native, non-Hispanic 


468 


Other, non-Hispanic 


567 


Informational subscale 


White, non-Hispanic 


555 


Black, non-Hispanic 


505 


Hispanic 


517 


Asian, non-Hispanic 


561 


American Indian/Alaska Native, non-Hispanic 


472 


Other, non-Hispanic 


571 



'The Other, non-Hispanic category includes Pacific Islander students and non-Hispanic students who identified multiple races. Students 
who identified themselves as being of Hispanic origin were classified as Hispanic, regardless of their race. 

NOTE: Estimates for race/ethnicity in 2001 are not shown because the classification of racial/ethnic categories and procedures for 
collecting data on race/ethnicity changed between 2001 and 2006. The United States met guidelines for sample participation rates after 
replacement schools were included. See appendix B for more information about participation rates and the use of replacement schools in 
sampling. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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Results From the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS) 



Reading Literacy and School and Classroom 
Characteristics 



Reading Literacy may differ across students along a 
variety of factors, including characteristics of the 
schools and classrooms that students attend. To help 
examine the relationship between school and classroom 
characteristics and reading literacy, PIRLS collected 
information from school administrators and teachers in 
the United States on different aspects of their schools 
and classrooms. 

Note that these data, as with all data presented in 
this report, are used to describe relationships between 
variables. These data are not intended, nor can they be 
used in this context, to imply causality. 

Control of school 

Among U.S. students in 2006, the average score for 
students in private schools (561) was higher than the 
average score for students in public schools (538) for 
the combined reading literacy scale.” Average scores 
for students in both U.S. public and private schools 
were higher than the PIRLS scale average (500) for the 
combined scale and the two subscales. 

School poverty level 

In the United States, the poverty level of a school was 
measured by asking school administrators to estimate 
the percentage of students in their schools who were 
eligible for free or reduced-price lunch (see appendix 



“The effect size for the difference between public and private 
schools was .33. See appendix B for a discussion of effect 
sizes. 



B for a discussion of the relationship between poverty 
levels and the National School Lunch Program). Of U.S. 
students in public schools, 2 percent were enrolled in 
schools with no students eligible for free or reduced- 
price lunch, 87 percent were in schools with some 
students eligible for free or reduced-price lunch, and 
11 percent were in schools with all students eligible for 
free or reduced-price lunch. 

Among U.S. students in public schools, the average 
score on the combined reading literacy scale for stu- 
dents in schools with no students eligible for free or 
reduced-price lunch was 93 points higher than the 
average score for students in schools in which all stu- 
dents were eligible (figure 6). The average score for 
students in schools with some students eligible for free 
or reduced-price Lunch was also higher than the average 
score for students in schools in which all students were 
eligible for free or reduced-price lunch.” 

Instructional practices related to 
reading 

According to reports from school administrators, 95 per- 
cent of U.S. students attended schools with informal 
initiatives to encourage reading. The percentage of U.S. 
students in schools with informal initiatives was 15 
percentage points higher than the international average 
(80 percent) and also higher than the percentage of 
students in such schools in 30 other jurisdictions. 



“The effect size for the difference between the some and all 
categories of free or reduced-price lunch participation was 
.70. See appendix B for a discussion of effect sizes. 
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As indicated in figure 7, the percentage of students in 
the United States with teachers who reported teaching 
reading for more than 6 hours per week (68 percent) 
was higher than the international average (25 percent). 
Moreover, the percentage of students in the United 
States receiving more than 6 hours of instruction per 
week was higher than the percentage of students 
receiving the same amount of instruction in all partici- 
pating PIRLS jurisdictions. 



Figure 6. Average scores for U.S. fourth- 
grade students in public schools 
on combined reading literacy scale, 
by school enrollment eligible for 
free or reduced-price lunch: 2006 




School enrollment eligible for 
free or reduced-price lunch 



U.S. average 

PIRLS scale 
average 



NOTE: Results based on information collected from school 
administrators. The PIRLS scale average represents all partici- 
pating PIRLS jurisdictions, including the United States. The 
United States met guidelines for sample participation rates 
after replacement schools were included. See appendix B for 
more information about participation rates and the use of 
replacement schools in sampling. 

SOURCE: International Association for the Evaluation of Educa- 
tional Achievement, Progress in International Reading Literacy 
Study (PIRLS), 2006. 



Although the amount of reading instruction may 
vary across students and schools, average scores for 
U.S. students on the combined reading literacy scale 
did not measurably differ by the amount of reading 
instruction received. 



Figure 7. Percentage distribution of fourth- 
grade students in United States and 
internationally receiving reading 
instruction each week, by average 
number of hours spent on reading 
instruction each week: 2006 

Percent 




Average number of hours of reading 
instruction per week 

I I Up to and Q More than 3 and | More than 

including 3 hours including 6 hours 6 hours 



*p < .05. Significantly different from international percentage 
at the .05 level of statistical significance. 

NOTE: Results based on information collected from teachers. 
The United States met guidelines for sample participation 
rates after replacement schools were included. See appendix B 
for more information about participation rates and the use of 
replacement schools in sampling. Detail may not sum to totals 
because of rounding. 

SOURCE: International Association for the Evaluation of Educa- 
tional Achievement, Progress in International Reading Literacy 
Study (PIRLS), 2006. 
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Results From the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS) 



Teacher preparation and 
experience 

Teachers of sampled U.S. students reported whether 
they were certified to teach and the number of 
years they had been teaching. Nearly all U.S. fourth- 
grade students (99 percent) were taught by certified 
teachers; the U.S. percentage was higher than the 
international average (97 percent). Nineteen jurisdic- 
tions reported that 100 percent of their fourth-grade 
students were taught by certified teachers. 



On average, U.S. fourth-grade teachers had fewer years 
of teaching experience (12 years) than the international 
average (17 years). The U.S. average was lower than the 
average years of teaching experience in 35 of the partici- 
pating PIRLS jurisdictions. Average teaching experience 
was lower in the United States not only relative to most 
other participating jurisdictions but also relative to the 
last administration of PIRLS: Between 2001 and 2006, 
the average years of experience for fourth-grade teachers 
in the United States decreased from 15 to 12 years. 
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Home Environment for Reading 



Students in all participating PIRLS jurisdictions, includ- 
ing the United States, were asked to answer a variety of 
guestions related to their home environment for read- 
ing. Students reported the types of materials they read 
outside of school, as well as the freguency with which 
they read these materials. 

Reading activities outside of school 

As indicated in table 4, students in the United States 
were more likely to read stories or novels every day or 
almost every day (36 percent) than to read for informa- 
tion every day or almost every day (14 percent). The 
percentage of U.S. students who read stories or novels 
every day or almost every day was 4 percentage points 



higher than the international average. However, the 
freguency with which U.S. students read for information 
every day or almost every day was 2 percentage points 
lower than the international average. 

The average score on the combined reading literacy 
scale for U.S. students who read stories or novels every 
day or almost every day (558) was higher than the aver- 
age score for students who read stories or novels once 
or twice a week (541), once or twice a month (539), 
and never or almost never (509). In contrast, the aver- 
age score for students who read for information every 
day or almost every day (519) was lower than the aver- 
age score for students who read for information once or 
twice a week (538), once or twice a month (553), and 
never or almost never (546). 



Table 4. Percentage distribution of fourth-grade students in United States and internationally who 
read stories or novels or read for information, by frequency of reading outside of school: 
2006 



Frequency and type of reading 


United States 


Internationally 


Stories or novels 


Every day/almost every day 


36* 


32 


Once or twice a week 


28* 


31 


Once or twice a month 


18 


18 


Never/almost never 


18 


19 


Information 


Every day/almost every day 


14* 


16 


Once or twice a week 


43 


43 


Once or twice a month 


33* 


29 


Never/almost never 


10* 


12 



*p < .05. Significantly different from international percentage at the .05 level of statistical significance. 

NOTE: The United States met guidelines for sample participation rates after replacement schools were included. See appendix B for more 
information about participation rates and the use of replacement schools in sampling. Detail may not sum to totals because of rounding. 
SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 



16 




Results From the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS) 



The higher performance of U.S. students who read for 
information less frequently relative to U.S. students 
who read for information more frequently was also 
observed internationally. The international average 
on the combined reading literacy scale for students 
who read for information once or twice a week was 
503, the average for students who read for informa- 
tion once or twice a month was 506, and the average 
for students who read for information never or almost 
never was 496. In contrast, the international average 



on the combined reading literacy scale for students 
who read for information every day or almost every 
day was 492.^^ 

Note that these data, as with all data presented in 
this report, are used to describe relationships between 
variables. These data are not intended, nor can they be 
used in this context, to imply causality. 

“Estimates and standard errors for international comparisons 
are available in Mullis et al. (2007). 
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Appendix A: Sample Items From PIRLS 2006 



This appendix contains a sample reading passage from 
PIRLS 2006 as well as several of the assessment items 
associated with the passage. The assessment items 
show actual student responses and also compare the 
performance of U.S. fourth-grade students on the item 



with the international average. The items demonstrate 
acceptable performance across the PIRLS international 
benchmarks (low, intermediate, high, and advanced). 
The reading passage and all associated items have been 
publicly released by the lEA. 




An Unbelievable 
Night 

by Fmu //oA/«r 

A nirw was ten yean old. so even half a.skvp she couM find 
her way from her room to the bathroom. The door to her 
n • >m was UMiolly open a track, and the nifditlight m the 
hallway made it light enough to gel to the bathroom past the 
telephone stand. 

One night, as she poased the telephone stand on her way to 
the bathroom. Anina heard something that sounded like a quiet 
hissing. But. because she was half asleep, she didn't really pay 
any attention to it. Anyway, it came From pretty for away. Not 
until she was on her way back to her room did she see where it 
came from. Under the telephone Uand there was a large pile of 
old newspapers and magasines. and this pile now began to move. 

That WHS where the noise was coming from. All of a sudden 
the pile started to fall over - right, left, forwards, backwards 
- then there were newspapers and magazines all over the floor. 




Anina could not believe her eyea us she watched a grunting and 
snorting crocodile come out from under the telephone stand. 

Anina was frozen to the spot. Her eyes wide as saucers, 
she watched the crocodile crawl completely out of the 
newspapers and slowly look around the apartment. It sei>med 
to havY just come out of the water because its whole body was 
dripping wet. Wherever the crocodile stepped, the carpet under 
it became drenched. 







► 
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Th« crocodile moved ite head back and forth iettinfi out 
a loud hissing sound. Anina swallowed bard, looking at the 
crocodile's snout with its terribly long row of teeth. It swung 
its tail slowly back and forth. Anina had read about that in 
'Animal Magazine"- how the crocodile whips the water with its 
tail to chase away or attack its enemies. 

Her gaze fell on the last issue of "Animal Magazine," 
which had fallen frran the pile and was lying at her feet. She ^ 
another shock. The cover of the magazine used to have a picture 
of a big crocodile on a river bank. The river bank was now 
empty! 

Anina bent down and picked up the magazine. At that 
RKicnenl the crocodile whipped his tail so hard that be cracked 
the big vase of sunHowers on the floor and the sunflowers 
scattered everywhere. With a quick jump Anina was in her 
bedroom. She slammed the door shut, grabbed her bed and 
pushed it up against the door. She had built a barricade that 
would keep her safe from the crocodile. Relieved, she lei her 
breath out. 

But then she hesitated. >Vhat if the beast was simply 
hungry? Maybe to make the crocodile go away you had to give H 
wmething to ent? 

Anina looked again at the animal magazine. If the crocodile 
ooukt crawl out of a picture then perhaps other animats could 
too. Anina hastily flipped through the magazinr and lOopped at 
a swarm of flamingos in a jungle swamp. Just ri{dtt, she thou^it 
They look like a hirthday cake for crocodiles. 

Suddenly there was a loud crock and the tip of the 
crocodile's tail pushed through the splintered door. 

Quickly. Anina held the picture of the flamingos up to 
the hole in the door and called os loud as she could, "Got out 
of the swamp! Shoo! Shoo!" Then she threw the magazine 
through the hole Into the hallway, dapped her hands and 
yelled and semuned. 




In the morning, it was vary difRcult for her to explain the 
giant wet spot on the floor and the broken door to her parents. 
They weren't convinced about the crocodile even though her 
mother's hat was nowhere to be found. 



She could hardly believe what happened next. The entire 
hallway was suddenly filled with screeching flamingos wildly 
flapping their wings and running around all over the place on 
their long, duony legs. Anina saw one bird with a sunflower in 
its beak and another grabbing her mother's hat from its hook. 
She al» saw a flamingo disappear into the crocodile’s mouth. 
With two quick bites hr swallowed the flamingo and quickly 
followed it with another, the one with the sunflower in its beak. 

After two portions of flamingo the crocodile seemed to 
have had enough and lay down contentedly in the middle of 
the hallway. When he hod closed his eyes and no longer moved. 

Anina quietly 
opened her door and 
slipped through it 
into the hallway. 
She placed the 
empty magazine 
cover in front of 
the crocodile's 
nose. "Plcuse," she 
whispered, "please 
go back home." She 
avpt back Into the 
be d room and looked 
throu^ the hole in 
the door. Sbe saw 
the crocodile back 
on the cover of the 
magazine. 

She now went cautiously into the living room where the 
flamingos were crowded around the sofa and standing on the 
television. Anina opened the magazine U> the page with the 
empty picture. 'Thank you," ^ said, "thank you very much. 
You may now go bw.ii to your swamp." 
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Figure A-1. Example A of item at PIRLS low international benchmark: 2006 



1 Point: Full-credit sample response 



7. How did the bedroom door get broken? 

^ The crocodile’s tail pushed through it. 

® The big vase cracked against it. 

© The flamingo’s sharp beak crashed into it. 
® The bed smashed against it. 



Percentage of students 
earning full-credit 

International average 77 

United States 83* 






*p < .05. Significantly different from international average at the .05 Level of statistical significance. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading literacy Study 
(PIRtS), 2006. 



Figure A-2. Example B of item at PIRLS low international benchmark: 2006 
1 Point: Full-credit sample response 

9. At the end of the story, how did Anina feel toward the flamingos? 

© guilty 
® cautious 
^ grateful 
© annoyed 



*p < .05. Significantly different from international average at the .05 Level of statistical significance. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 



Percentage of students 
earning full-credit 

International average 69 

United States 61* 
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Figure A-3. Example of item at PIRLS intermediate international benchmark: 2006 



1 Point: Full-credit sample response 



5. Put the following sentences in the order in which they happened in 
the story. 



The first one has been done for you. 
z Anina saw the crocodile. 

^ / The crocodile ate two flamingos. 



5 



Anina tried to explain the broken door to her parents. 
Anina started to walk to the bathroom. 




Anina ran to the bedroom and slammed the door. 



Percentage of students 
earning full-credit 

International average 67 

United States 79* 



*p < .05. Significantly different from international average at the .05 level of statistical significance. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading titeracy Study 
(PIRLS), 2006. 



Figure A-4. Example of item at PIRLS high international benchmark: 2006 



2 out of 2 Points: Full-credit sample response 


8. How did the magazine help Anina? Write two ways. 

fvToiqciZine. help Ve.ll 




"fhe crocodile come PfOfw. 


(2)2.X-|. )f\e\c> AniriO. ■!«> W-Vvjvro \Ajha.V’ 


Percentage of students 
earning full-credit 


+he crocoejile cio 


wqs -|-<5 


International average 41 

United States 54* 



*p < .05. Significantly different from international average at the .05 level of statistical significance. 

SOURCE; International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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Figure A-5. Example of item at PIRLS advanced international benchmark: 2006 



3 out of 3 Points: Full-credit sample response 



11. You learn what Anina was like from the things she did. 

Describe what she was like and give two examples of what she did 
that show this. 



*p < .05. Significantly different from international average at the .05 level of statistical significance. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 







International average 
United States 



Percentage of students 
earning full-credit 



16 

22 * 
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Appendix B: Technical Notes 



Introduction 

This appendix describes the sampling, data collection, 
test development and administration, weighting and 
variance estimation, scaling, and statistical testing 
procedures used to collect and analyze the data for the 
2006 Progress in International Reading Literacy Study 
(PIRLS). Forty-five jurisdictions participated in PIRLS 
2006, which collected data on the reading literacy of 
students in their fourth year of schooling (fourth-grade 
students in most participating jurisdictions, including 
the United States). 

PIRLS 2006 is the second administration of the study, 
which was first administered in 2001. The study is 
conducted by the International Association for the 
Evaluation of Educational Achievement (lEA), with 
national sponsors in each participating jurisdiction. 
In the United States, PIRLS is sponsored by the 
National Center for Education Statistics (NCES), in the 
Institute of Education Sciences in the U.S. Department 
of Education. Further information about the technical 
aspects of the assessment are available in the interna- 
tional PIRLS 2006 technical report (Martin, Mullis, and 
Kennedy 2007). 



Sampling, data collection, and 
response rate benchmarks 

The PIRLS 2006 international project team instituted a 
series of sampling, data collection, and response rate 
benchmarks to ensure international comparability and 
to provide the ability to produce precise estimates of 
the main criterion variables for all jurisdictions. 

The target population for PIRLS was defined by lEA 
using the International Standard Classification of 
Education (ISCED), developed by the United Nations 
Educational, Scientific, and Cultural Organization 
(UNESCO 1999). The target population of interest was 
all students enrolled in the grade corresponding to the 
fourth year of schooling, beginning with ISCED Level 
1. For most jurisdictions, this was the fourth grade or 
its national equivalent. This definition is different from 
the one used in 2001, which targeted students in the 
upper of the two grades that include the most 9-year- 
olds, which in most jurisdictions was the fourth grade. 
Table B-1 provides information on ISCED levels for the 
United States. 



Table B-1. 


International Standard Classification of Education (ISCED) levels, 
equivalents in preprimary through 12th grade 


definitions, and U.S. 


ISCED level 


Definition 


U.S. equivalent 


0 


Preprimary 


Kindergarten and below 


1 


Primary 


lst-6th grades 


2 


Lower secondary 


7th-9th grades 


3 


Upper secondary 10th-12th grades or first 3 years of vocational education 


SOURCE: Matheson, N., Salganik, L., Phelps, R., Perie, M., ALsalam, N., and Smith, T., (1996). Education Indicators: An Internationai Per- 
spective. U.S. Department of Education. Washington, DC: National Center for Education Statistics. 
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Standardized procedures for sampling were developed 
by lEA and disseminated in a school sampling manual. 
Statistics Canada was responsible for approving the 
designs and verifying the samples of all participating 
jurisdictions. The basic sample design called for a two- 
stage stratified cluster design, with schools selected 
at the first stage and classrooms at the second stage. 
Schools were sampled using a probability proportion- 
ate to size sampling method. Within each jurisdiction 
150 schools were selected. Information on the number 
of classrooms containing fourth-grade students, and 
the size of the classes, were collected from participat- 
ing schools and entered into the within school sam- 
pling software provided by lEA. In most jurisdictions, 
one or two classes per school were randomly selected 
using this software. All students in sampled class- 
rooms were selected. 

lEA also established sample size and response rate 
targets for all jurisdictions. As table B-2 shows, the 
response rate target for schools was set at 85 percent, 
with a minimum participation rate among "original 
sample schools" of 50 percent. When the original 
sample was drawn, the schools immediately before and 
immediately after each sampled school on the sampling 
frame were designated "replacement" schools and were 
contacted if the original sample school refused to par- 
ticipate. The response rate target for classrooms was 
95 percent, and the target student response rate was 
set at 85 percent. In addition, classrooms with student 
participation below 50 percent were to be rejected from 
inclusion with the final data. Substitution of sampled 
classrooms was not permitted, and the school would be 



classified as a non-respondent if no other classrooms 
had been sampled. No U.S. schools were classified as 
non-respondents on the basis of these criteria. 

The lEA's minimum acceptable rate for overall sample 
participation after replacement (the product of the 
school participation rate and the student participa- 
tion rate) was 75 percent. In 2006, the overall sam- 
ple participation rate for Norway was 71 percent. 
Conseguently, all data reported for Norway in this report 
have the following footnote: "Did not meet guidelines 
for sample participation rates after replacement schools 
were included." 

The goal of the study was to provide 95 percent cover- 
age of the target population within each jurisdiction. 
Jurisdictions that excluded more than 5 percent of 
students for any reason are noted in the international 
report as having less than full coverage of the target 
population. 

Sampling, data collection, and 
response rates in the United States 

Sampling 

The PIRLS sample in the United States was designed 
to be representative of all fourth-grade students in the 
50 states and the District of Columbia. In addition to 
the base sample (designed to yield 150 participating 
schools), the United States sampled additional private 
schools and high-poverty schools, defined as those 
schools in which 50 percent or more of students were 



Table B-2. 


lEA minimum sample size and unweighted response rate targets for participating PIRLS 
jurisdictions: 2006 


Group 


Minimum sample size (number) 


Unweighted response rate (percent) 


Schools 


150 


85' 


Classrooms 


1 per sampled school 


95 


Teachers 


1 per sampled school 


85 


Students 


4,500 


85 



'At Least 50 percent must be original sample schools. 

SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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eligible to receive free or reduced-price lunch in order 
to increase the precision of the estimates for these 
subgroups. The U.S. sample was designed to yield 180 
participating schools. 

The PIRLS school sample was drawn in March 2005. The 
sampling frame was constructed using data from the 
2002-03 Common Core of Data (CCD) and Preliminary 
Data 2003-04 Private School Universe Survey (PSS). 

To be consistent with the sampling design for PIRLS 
2001, the frame was divided into two parts: (1) One 
stratum was created that included schools located in 
the 10 most populous Metropolitan Statistical Areas 
(MSAs); (2) all schools outside those MSAs were grouped 
into 451 Primary Sampling Units (PSUs) by sorting on 
MSA and then by the Federal Information Processing 
Standards (FIPS) code. PSUs were designed to fit within 
state boundaries and, where possible, within county 
and city boundaries. In the United States, schools were 
sorted by state, percentage of racial/ethnic minority 
students, control of school (public/private), percentage 
of students eligible for free or reduced-priced lunch, 
and locale before the selection process. 

Locale was determined on the basis of a sampled 
school's address. School addresses were classified into 
one of three categories (central city, urban fringe/large 
town, or rural/small town) using the NCES locale code 
system in use at the time of sampling. The locale code 
system used the following designations: 

• Large city: A central city of a Consolidated 
Metropolitan Statistical Area (CMSA) or MSA, with 
the city having a population greater than or equal 
to 250,000. 

• Midsize city: A central city of a CMSA or MSA, with 
the city having a population less than 250,000. 

• Urban fringe of a large city: Any territory within a 
CMSA or MSA of a large city and defined as urban 
by the Census Bureau. 

• Urban fringe of a midsize city: Any territory within 
a CMSA or MSA of a midsize city and defined as 
urban by the Census Bureau. 



• Large town: An incorporated place or Census-desig- 
nated place with a population greater than or equal 
to 25,000 and located outside a CMSA or MSA. 

• Small town: An incorporated place or Census-desig- 
nated place with a population less than 25,000 and 
greater than or equal to 2,500 and located outside 
a CMSA or MSA. 

• Rural, Outside MSA: Any territory designated as 
rural by the Census Bureau that is outside a CMSA 
or MSA of a large or midsize city. 

• Rural, Inside MSA: Any territory designated as rural 
by the Census Bureau that is within a CMSA or MSA 
of a large or midsize city. 

For this analysis, large city and midsize city were com- 
bined to form central city; urban fringe of a large city, 
urban fringe of a midsize city, and large town were 
combined to form urban fringe/large town; and small 
town, rural, outside MSA, and rural, inside MSA were 
combined to form rural/small town. 

Within each selected PSU or MSA stratum, schools were 
selected on the basis of the number of fourth-grade stu- 
dents in the school so that larger schools had a higher 
probability of selection than smaller schools. The final 
sample included 222 schools; 152 were chosen from 
PSUs and 70 were selected from the MSA stratum. The 
target number of students was designed to be similar 
across schools, both large and small, correcting for the 
greater likelihood of selection of large schools. 

Data collection 

School contacting began in April 2005, approximately 
1 year prior to data collection. The suggested test- 
ing window for PIRLS in the southern hemisphere was 
October through December, 2005, and in the northern 
hemisphere it was March through June, 2006. The 
United States was allowed to begin early (on January 
23) to accommodate schools that wished to partici- 
pate before state-mandated tests occurred. Many U.S. 
schools also asked to participate after completing state 
tests, and so the United States was allowed to continue 
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testing through June 9, 2006, resulting in a 4Va month 
test window rather than the more typical 1 to 2 month 
test window. The mean score of students completing the 
exam in January through March was 539.5, which was 
not significantly different from the score (541.1) of the 
students completing the exam in April through June. 

Response rates 

Of the 222 sampled schools, 214 were eligible for inclu- 
sion in PIRLS. Eight schools had closed and were des- 
ignated ineligible. Of the 214 eligible original sample 
schools, 120 participated (57 percent weighted). An 
additional 63 replacement schools were contacted and 
agreed to participate, for a total of 183 schools, or a 
weighted response rate, using final adjusted weights, of 
86 percent of eligible schools.^ Of the 120 participating 
schools from the original sample, 88 (73 percent) were 
from the PSU sample, while 40 of the 63 participat- 
ing replacement schools (63 percent) were from the 
PSU sample. The United States met the international 
guidelines for school response rate, but only after using 
replacement schools. 

Information on the number and size of classrooms 
containing fourth-grade students was collected from 
all participating schools. One or two classrooms were 
randomly selected from each school depending on the 
size of the school. Of the 256 classrooms sampled, 255 
participated, or 99 percent. There were 5,601 fourth- 
grade students enrolled in the selected classrooms; 
159 of these students were excluded from testing (see 
"Exclusions" for more information). Within these class- 
rooms, 5,442 students were eligible, and 5,190 completed 
the assessment, for a weighted student response rate of 
95 percent. The United States met the international 
guidelines for classroom and student response rates. 



'All weighted response rates discussed in this report refer to 
final adjusted weights. Response rates were calculated using 
the formula developed by the lEA for PIRLS. The standard 
NCES formula for computing response rates would result in a 
lower school response rate of approximately 63 percent. 



In addition to having students complete the assessment 
and a guestionnaire, PIRLS asked teachers and school 
administrators to complete guestionnaires. Of the 256 
teachers sampled, 249 completed teacher question- 
naires, or 97 percent. Among school administrators, 
182 of the 183 questionnaires were completed, for a 
response rate of 99 percent. 

Table B-3 presents information on the total number of 
participating schools, students assessed, and overall 
weighted response rates after replacement in all juris- 
dictions that participated in PIRLS. 

Exclusions 

Schools that were very small or that were classified as 
special education, vocational, or alternative schools 
(private and public) could be excluded from the sam- 
pling frame. In the United States these schools enrolled 
3.2 percent of the expected number of fourth-grade 
students. Table B-4 shows the percentage of students 
excluded from the sample in 2001 and 2006. 

International guidelines recognized that some students 
might not be eligible for inclusion in PIRLS because 
of limited exposure to the language of assessment 
(English in the case of the United States) or the need 
for special testing accommodations. 

Within classrooms, students were excluded from 
participation in PIRLS if they met the criteria estab- 
lished by the lEA: 

• Functionally disabled students. These are students 
who are permanently physically disabled in such a 
way that they cannot perform in the PIRLS testing 
situation. Functionally disabled students who could 
perform were included in the testing. 

• Intellectually disabled students. These are students 
who are considered in the professional opinion of 
the school administrator or by other qualified staff 
members to be intellectually disabled or who have 
been psychologically tested as such. This includes 
students who are emotionally or mentally unable to 
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Table B-3. Total number of participating schools, students assessed, and overall weighted response rates, 
by participating PIRLS jurisdictions: 2006 



Jurisdiction 


Total number of 
participating 
schools 


Total number of 
students 
assessed 


Overall weighted 
response 
rate 


Austria 


158 


5,067 


97 


Belgium (Flemish) 


137 


4,479 


91 


Belgium (French) 


150 


4,552 


95 


Bulgaria 


143 


3,863 


94 


Canada, Alberta 


150 


4,243 


96 


Canada, British Columbia 


148 


4,150 


94 


Canada, Nova Scotia 


201 


4,436 


96 


Canada, Ontario 


180 


3,988 


87 


Canada, Quebec 


185 


3,748 


81 


Chinese Taipei 


150 


4,589 


99 


Denmark 


145 


4,001 


96 


England 


148 


4,036 


92 


France 


169 


4,404 


95 


Georgia 


149 


4,402 


98 


Germany 


405 


7,899 


92 


Flong Kong, SAR 


144 


4,712 


97 


Flungary 


149 


4,068 


97 


Iceland 


128 


3,673 


90 


Indonesia 


168 


4,774 


98 


Iran 


236 


5,411 


99 


Israel 


149 


3,908 


93 


Italy 


150 


3,581 


97 


Kuwait 


149 


3,958 


88 


Latvia 


147 


4,162 


92 


Lithuania 


146 


4,701 


92 


Luxembourg 


178 


5,101 


99 


Macedonia 


150 


4,002 


96 


Moldova 


150 


4,036 


95 


Morocco 


159 


3,249 


94 


Netherlands 


139 


4,156 


90 


New Zealand 


243 


6,256 


95 


Norway 


135 


3,837 


71 


Poland 


148 


4,854 


95 


Qatar 


119 


6,680 


94 


Romania 


146 


4,273 


97 


Russian Federation 


232 


4,720 


97 


Scotland 


130 


3,775 


81 


Singapore 


178 


6,390 


95 


Slovak Republic 


167 


5,380 


94 


Slovenia 


145 


5,337 


93 


South Africa 


397 


14,657 


88 


Spain 


152 


4,094 


97 


Sweden 


147 


4,394 


96 


Trinidad and Tobago 


147 


3,951 


94 


United States 


183 


5,190 


82 



NOTE: The overall weighted response rate is the product of the school participation rate, after replacement, and the student participation rate, 
after replacement. 

SOURCE; International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study (PIRLS), 
2006. 
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follow the general instructions of the test. Students 
were not excluded solely because of poor academic 
performance or normal disciplinary problems. 

• Non-native language speakers. These are students 
who are unable to read or speak the language of 
the test and would be unable to overcome the lan- 
guage barrier in the test situation. Typically, stu- 
dents who received less than 1 year of instruction 
in the language of the test were to be excluded, 
but this definition could be adapted in different 
jurisdictions. In the United States, students who 
had received less than 1 year of English instruction 
were defined as non-native language speakers. 

In the United States, 2.8 percent of students were 
excluded from PIRLS on the basis of these criteria. In 
keeping with international protocol, no testing accom- 
modations were offered to students. 

The overall exclusion rate was 5.9 percent in the United 
States, which means that the overall U.S. coverage rate 
is .09 percent below the recommended 95 percent. Other 
jurisdictions that had exclusion rates above 5.0 percent 
included Bulgaria (6.4); the province of Ontario, Canada 
(8.3); Israel (22.5); Italy (5.3); Lithuania (5.1); New 
Zealand (5.3); and the Russian Federation (7.7). 



Table B-4. Percentage of U.S. students 
excluded from PIRLS at the 
school-listing level and student- 
listing level: 2001 and 2006 



Level 


2001 


2006 


Total 


5.3 


5.9 


Excluded at the school listing level 


0.6 


3.2 


Excluded at the student listing level 


4.7 


2.8 



NOTE: Detail may not sum to totals because of rounding. 
SOURCE: International Association for the Evaluation of Educa- 
tional Achievement, Progress in International Reading Literacy 
Study (PIRLS), 2001 and 2006. 



Nonresponse bias analysis 

The analysis of school nonresponse was conducted in 
two parts. The basis for both analyses was the original 
sample of 214 eligible schools. First, the distribution of 



the 120 responding original sample schools was com- 
pared with that of the total sample of eligible original 
schools. All original schools in the sample that declined 
to participate in the study were treated as nonpartici- 
pants regardless of whether they were substituted by a 
replacement school. In the second part, replacement 
schools were included in the analysis, reflecting the final 
sample of schools that participated in PIRLS 2006. 

Seven variables were examined using the original 
sample, the participating schools from the origi- 
nal sample, and the participating schools in the 
final sample: (1) public/private school control, 
(2) locale, (3) region, (4) percentage of students eli- 
gible for free or reduced-price lunch, (5) total school 
enrollment, (6) fourth-grade enrollment, and (7) relative 
enrollment of racial and ethnic groups (White, non- 
Hispanic; Black, non-Hispanic; Hispanic; Asian or Pacific 
Islander; American Indian or Alaska Native; and other). 

Measures of bias and relative bias were computed, and 
the hypothesis of independence between the char- 
acteristic and participation status was tested using 
chi-sguare statistics. In addition, logistic regression 
models were used to evaluate whether any of these 
characteristics were significant in predicting response 
status. A comparison of the participating schools from 
the original sample with the total eligible sample of 
schools found that school composition was significantly 
different across the two groups: the mean percentage 
of Asian students in schools in the eligible sample was 
3.5 percent, while among participating original sample 
schools it was 2.4 percent; the measure of bias is 1.07. 
No other variables were found to differ significantly 
between these two groups. 

In the second analysis, the final sample of all par- 
ticipating schools (both original and replacement) was 
compared to the total eligible sample. In this analysis, 
the percentage of Asian students in the school was 
not significantly different between the two groups. 
However, the number of fourth-grade students enrolled 
in the school was related to nonresponse. Schools with 
fewer students enrolled in fourth grade (schools with 
an average of 67 students in the fourth grade) were 
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less Likely to participate than larger schools (schools 
with an average of 71.2 students in fourth grade); the 
measure of bias is 4.17. It is unclear whether this bias 
has any impact on student achievement scores. More 
detailed information on nonresponse bias analysis, 
including item nonresponse analysis, can be found in 
Krotki and Bland (2007). 

Test development 

The International Study Center (ISC), which organized 
and managed the international components of PIRLS, 
developed an assessment framework used to guide the 
test development process (Mullis et al. 2006). PIRLS 
was designed to assess two purposes of reading: read- 
ing for literary experience and reading to acquire and 
use information. In addition, the PIRLS assessment 
evaluates four processes of comprehension: (1) to focus 
on and retrieve explicitly stated information; (2) to 
make straightforward inferences; (3) to interpret and 
integrate ideas and information; and (4) to examine 
and evaluate content, language, and textual elements. 

Jurisdictions participating in PIRLS 2006 were invited 
to submit reading passages to be used in the test. Two 
types of passages were sought: literary texts, which 
were typically narrative fiction, and informational 
texts, which could include biographies, step-by-step 
directions, informational Leaflets, and scientific or 
other nonfictional material. All passages were to be 
authentic texts typical of the reading material in 
their jurisdictions, well suited to fourth-grade stu- 
dents, and no Longer than 1,000 words. The national 



research coordinators from participating jurisdictions 
were asked to review the texts and work together 
to agree on a shortened List of passages to be illus- 
trated and formatted. Questions for each passage were 
refined by PIRLS project staff and reviewed by a group 
of reading experts. Each reading passage, including 
text and questions, was designed to be completed in 
40 minutes. 

Twelve new passages were created and tested during 
a field trial in spring 2005. Item statistics, including 
item difficulties, point biserial correlations, and item 
discrimination statistics, were calculated for each item 
for each jurisdiction. After a careful review of the qual- 
ity of all items across jurisdictions, 6 of these passages, 
3 literary and 3 informational, were selected for the 
main study. 

These passages, along with 4 passages from PIRLS 
2001, were used to create the test booklets for the 
main study. The same 10 passages were used in all 
participating PIRLS jurisdictions. Each test booklet 
contained 2 reading passages. Students were given 40 
minutes to complete each passage, or 80 minutes in 
all. The passages were distributed across 13 booklet 
types. Students were asked to answer a number of items 
related to each passage, including both multiple-choice 
and constructed-response items. The distribution of the 
items by type of passage and type of item is shown in 
table B-5. 

In addition to the assessment, students were asked to 
complete a 20-minute questionnaire. The questionnaire 
included items about students' reading experiences in 



Table B-5. Distribution of items on the PIRLS 2006 assessment 

Total Total 



Reading Multiple- Constructed-response items number score 



purpose 


choice items 


1 point 


2 points 


3 points 


of items 


points 


Total 


64 


28 


27 


7 


126 


167 


Literary 


34 


13 


13 


4 


64 


85 


Informational 


30 


15 


14 


3 


62 


82 



SOURCE: International Association for the Evaluation of Educational Achievement, Progress in International Reading Literacy Study 
(PIRLS), 2006. 
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school, self-perception and attitudes toward reading, 
out-of-school reading habits and computer use, home 
literacy resources, and basic demographic information. 

Translation 

Source versions of all instruments (assessment booklets, 
questionnaires, and manuals) were prepared in English 
and then translated into the primary language or lan- 
guages of instruction in each jurisdiction. In addition, 
it was sometimes necessary to adapt the instrument 
for cultural purposes, even in jurisdictions such as the 
United States that use English as the primary language 
of instruction. For example, words such as "lift" might 
be adapted to "elevator" for the United States. The lEA 
and ISC verified the translations and adaptations used 
by all participating jurisdictions. Certified translators 
were retained by the lEA to compare national versions 
with the source versions of all documents. 

Test administration and quality 
assurance 

PIRLS 2006 emphasized the use of standardized pro- 
cedures in all jurisdictions. Each jurisdiction was 
responsible for its own data collection; however, the 
lEA insisted that all jurisdictions use the procedures 
and materials developed by the international project 
team. The ISC developed standardized survey operations 
manuals that were used in all jurisdictions, as well as 
manuals for participating schools and test administra- 
tors, to ensure that data collection processes were con- 
sistent across jurisdictions. In addition, jurisdictions 
used standardized listing forms for student participa- 
tion and standardized session report forms. 

Test administration in the United States was carried out 
by a professional staff trained according to the interna- 
tional guidelines. School personnel were asked to assist 
with listing classrooms and students, selecting a test 
day, and selecting the parental consent procedures to 
be used at that school. Test administrators were respon- 
sible for all other aspects of the administration. 
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The ISC conducted quality monitoring visits at approx- 
imately 15 sampled schools in each jurisdiction. The 
international quality monitors were trained by the 
staff of the ISC and the lEA Secretariat. After each 
visit, the quality monitor completed a standard form 
describing the test session and any deviations from 
international protocols. 

In addition, each jurisdiction was encouraged to con- 
duct its own national quality monitoring operation. In 
the United States, a sample of 10 percent of schools 
was selected for monitoring. Project staff and field 
supervisory staff visited selected schools during the 
assessment administration and completed a classroom 
observation record immediately after the visit. 

Both international and national quality monitors were 
asked to verify that student and class lists were pre- 
pared correctly by the school personnel; verify the 
completeness and security of the test booklets; check 
when possible that the international guidelines con- 
cerning the exclusion of students had been properly 
followed; keep an independent record of session tim- 
ing; verify adherence to the script and instructions 
outlined in the test administrator manual; check that 
materials were distributed correctly; indicate whether 
the students were cooperative during the test session; 
and note whether the test administrator monitored 
that students were working in the correct section of 
the test booklet. 

Scoring 

PIRLS contained a large number of constructed-response 
items, as discussed in the test development section. 
The process of scoring these items was an important 
step in ensuring the quality and cross-jurisdiction 
comparability of the PIRLS data. Detailed guidelines 
were developed for the scoring guides themselves, and 
training materials were prepared including an extensive 
set of anchor and practice papers. These materials were 
prepared by the ISC with the advice and guidance of an 
international group of experts. 
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In spring 2006, the ISC organized an international train- 
ing session to present the material and train the scoring 
coordinators from participating jurisdictions, who in 
turn trained the national scorers. For each test item, 
the scoring guide described the intent of the guestion 
and how to code students' responses to each item. 
This description included guidelines for assigning full 
credit, partial credit, or no credit for each item. During 
the training session, PIRLS staff discussed the scor- 
ing guidelines for each item and reviewed the anchor 
papers (selected examples of real student answers) for 
each item. Trainees were asked to complete the practice 
papers, and the answers were then discussed. 

The criteria described in the scoring guides related 
only to evidence of reading comprehension. Students' 
writing abilities were not evaluated. A student could 
receive a high score for an item if the ideas expressed 
in the response exhibited a high level of understand- 
ing, even if the response contained misspellings or 
grammatical errors. Given that PIRLS was a timed test, 
responses were considered "first-draft writing." 

The reliability of coding was assessed in three ways. 
First, to establish within-jurisdiction scoring reliability, 
it was necessary for two different scorers to indepen- 
dently score a random sample of 200 responses for 
each constructed-response item. The degree of agree- 
ment between the scores assigned by the two scorers 
was a measure of the reliability of the scoring process. 
The average of each percent agreement across items 
was 93 percent, both for the United States and the 
international average. Second, international scoring 
reliability was assessed by having each jurisdiction use 
the lEA's Cross-Country Scoring Reliability software to 
score a common set of answers selected from field test 
and PIRLS 2001 responses. Finally, in jurisdictions that 
participated in both 2001 and 2006, the staff scoring 
the 2006 responses were also asked to score a sample 
of 2001 responses. The scores assigned in 2006 were 
then compared with the actual scores assigned to those 
responses in 2001. Information on trend and cross- 
jurisdiction reliability is available in the international 
technical report (Martin et al. 2007). 



Data entry and cleaning 

The national research coordinator for each jurisdiction 
assumed responsibility for data entry. All data were 
entered into a data entry system developed by the lEA 
Data Processing Center (lEA-DPC) with a number of built- 
in data guality checks. In addition, each jurisdiction was 
reguired to run a number of validity checks (e.g., check- 
ing the links among teachers, schools, and students) 
before delivering the data to the lEA-DPC. The lEA-DPC 
conducted a number of additional cleaning steps before 
providing each jurisdiction with a version of the cleaned 
data to be reviewed and accepted by the jurisdiction. 
The U.S. data were cleared through this process and no 
major issues were found. 

Weighting and variance estimation 

Using sampling weights is necessary for computing 
statistically sound, nationally representative estimates. 
Survey weights help adjust for the intentional over- or 
undersampling of certain sectors of the population, 
school or student nonresponse, or errors in estimating 
the size of a school at the time of sampling. Survey 
weighting for the entire international PIRLS 2006 
sample was carried out by Statistics Canada. 

The internationally defined weighting specifications 
for PIRLS reguired that each assessed student's sam- 
pling weight be the product of six weighting factors: 
the inverse of the school's probability of selection, an 
adjustment for school-level nonresponse, the inverse 
of the classroom's probability of selection, an adjust- 
ment for classroom-level nonresponse, the inverse of 
the student's probability of selection (always egual to 
1 because whole classrooms were selected), and an 
adjustment of student-level nonresponse. 

The statistics presented in this report are estimates of 
group and subgroup performance based on a sample of 
fourth-graders, rather than the values that could be cal- 
culated if every fourth-grader answered every guestion 
on the instrument. It is therefore important to have 
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measures of the degree of uncertainty of the estimates. 
Accordingly, in addition to providing estimates of per- 
centages of respondents and their average scale score, 
this report provides information about the uncertainty 
of each statistic. 

Because PIRLS used clustered sampling, conventional 
formulas for estimating sampling variability that assume 
simple random sampling and hence independence of 
observations are inappropriate. For this reason, PIRLS 
used a jackknife repeated replication method (Johnson 
and Rust 1992) to estimate standard errors that capture 
the sampling variance. 

Scaling and plausible values 

Each student who completed the PIRLS assessment read 
2 passages, rather than all 10 passages developed for 
the study, to keep individual response burden to a mini- 
mum. PIRLS used a matrix-sampling design to assign 
passages to booklets. Item Response Theory (IRT) was 
then used to combine these responses to provide accu- 
rate estimates of reading achievement in the student 
population in each jurisdiction. 

As was done in 2001, PIRLS used three distinct scaling 
models: a three-parameter model for multiple-choice 
items, a two-parameter model for constructed-response 
items that were scored as correct or incorrect, and a 
partial credit model for constructed-response items with 
more than two score points. 

Because each student completed only a limited set 
of items, plausible values were calculated to estimate 
student-level scores. PIRLS generated five possible 
scale scores for each student, which represented selec- 
tions from the distribution of scale scores of students 
with similar backgrounds who answered the assessment 
items the same way. The plausible values methodology 
is one way to ensure that the estimates of the mean 
performance of student subpopulations and the esti- 
mates of variability in those means are more accurate 
than those determined through traditional procedures. 



which estimate a single score for each student. During 
the construction of plausible values, careful guality 
control steps ensure that the subpopulation estimates 
based on these plausible values are accurate. 

It is important to recognize that plausible values are 
not test scores for individuals, and they should not be 
treated as such. Plausible values are randomly drawn 
from the distribution of scores that could be reason- 
ably assigned to each individual. As such, the plausible 
values contain random error variance components and 
are not optimal as scores for individuals. The PIRLS 
student file contains 15 plausible values per student, 
5 for each of the three scales (the combined reading 
literacy scale, the literary subscale, and the informa- 
tional subscale). If an analysis is to be undertaken 
with one of these scales, then (ideally) the analysis 
should be undertaken five times, once with each of 
the 5 relevant plausible value variables. The results of 
these five analyses are averaged, and then significance 
tests that adjust for variation between the five sets of 
results are computed. 

Descriptions of background variables 

In the United States, background guestionnaires were 
administered to students, school administrators, and 
teachers. The information collected from the back- 
ground guestionnaires provides a context for interpret- 
ing the results from the assessment. The following 
background variables are presented in this report: 

Sex 

Students were asked to indicate whether they were a 
boy or a girl. 

Race/ethnicity 

School administrators were asked to classify the race/ 
ethnicity of each sampled student into one or more of 
the following categories: 
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White 

Black 

Hispanic 

Asian 

American Indian/Alaska Native 
Pacific Islander 

For reporting, all students who were identified as 
Hispanic by their school's administrator were classified 
as Hispanic, regardless of their race. The remaining 
categories include only students who were identified 
as non-Hispanic. The other, non-Hispanic category 
includes non-Hispanic students identified as Pacific 
Islander as well as those non-Hispanic students identi- 
fied as belonging to multiple racial groups. Because the 
number of Pacific Islander and multiple-race students 
was each too small to report separately (fewer than 30 
students in each group), the two groups were combined 
into the other, non-Hispanic category. 

In 2001, data about the race and ethnicity of students 
were collected directly from student responses. The 
2001 student background guestionnaire also defined 
White and Black as White (not Hispanic) and Black 
(not Hispanic), respectively. Because the classification 
of racial/ethnic categories and procedures for collect- 
ing data on race/ethnicity changed between 2001 and 
2006, no comparisons between racial/ethnic groups in 
2001 and 2006 are presented in this report. 

School poverty level 

In this report, the percentage of students in schools 
eligible for the National School Lunch Program (NSLP) 
is used as a measure of a school's poverty level. The 
guidelines for the NSLP stipulate that children from 
families with incomes at or below 130 percent of 
the federal poverty level are eligible for free meals, 
while those between 130 percent and 185 percent of 
the federal poverty level are gualified for reduced- 
price meals. (For the period July 1, 2005, through 
June 30, 2006, for a family of four, 130 percent of 
the poverty level was $25,155 per year, and 185 



percent was $35,798. See http://www.fns.usda.gov/ 
cnd/lunch/ for more information.) 

Information about the poverty level of a school was col- 
lected from school administrators. Administrators were 
asked to indicate the percentage of students in their 
schools eligible for free or reduced-price lunch using 
the following categories: All, Some, or None. 

Data limitations 

As with any study, there are limitations to PIRLS that 
researchers should take into consideration. Estimates 
produced using data from PIRLS are subject to two 
types of error: nonsampling errors and sampling errors. 
Nonsampling errors can be due to errors made in the 
collection and processing of data. Sampling errors can 
occur because the data were collected from a sample 
rather than a complete census of the population. In 
addition to sampling errors, researchers should also 
be aware of missing data issues and how these issues 
were addressed. 

Nonsampling errors 

Nonsampling error is a term used to describe variations 
in the estimates that may be caused by population 
coverage limitations, nonresponse bias, and measure- 
ment error, as well as data collection, processing, and 
reporting procedures. For example, the sampling frame 
was limited to regular public and private schools in 
the 50 states and the District of Columbia and did not 
include Puerto Rico or the U.S. Trust Territories. The 
sources of nonsampling errors are typically problems 
such as unit and item nonresponse, the differences 
in respondents' interpretations of the meaning of the 
survey guestions, response differences related to the 
particular time the survey was conducted, and mistakes 
in data preparation. Some of these issues (particularly 
unit nonresponse) are discussed above in the section 
entitled "Sampling, data collection, and response rates 
in the United States." Note that this is a school-based 
sample; home-schooled children are not included. 
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It is difficult to identify and estimate either the amount 
of nonsampling error or the bias caused by this error. 
In PIRLS, efforts were made to prevent such errors from 
occurring and to compensate for them when possible. 
For example, the design phase entailed a field test that 
evaluated items as well as the implementation proce- 
dures for the survey. It should also be recognized that 
background information was obtained from students' 
self-reports, which are subject to several different forms 
of response bias. 

Sampling errors 

Sampling errors occur when a discrepancy between 
a population characteristic and the sample estimate 
arises because not all members of the target population 
are sampled for the survey. Both the size of the sample 
relative to the population and the variability of the 
population characteristics influence the magnitude of 
sampling error. The particular sample of students drawn 
in March 2005 was just one of many possible samples 
that could have been selected. Therefore, estimates 
produced from the PIRLS 2006 sample may differ from 
estimates that would have been produced had another 
sample of fourth-grade students been selected. This 
type of variability is called sampling error because it 
arises from using a sample of fourth-grade students in 
2006 rather than all fourth-grade students that year. 

Missing data 

Items with missing data were designated with one 
of four missing data codes: (1) omitted response or 
uninterpretable, (2) not administered, (3) not reached, 
and (4) not applicable. An "omitted response" occurred 
when a respondent was expected to answer an item 
but gave no response. An item was coded as "unin- 
terpretable" if some type of response was given but 
it was either invalid or unreadable. Items that were 
not administered, either by design or by error (e.g., 
a printing problem), were coded as "not administered." 
For assessment questions, the missing data code "not 



reached" was assigned for consecutive missing values 
starting from the end of the assessment passage. In the 
questionnaire data files, a code of "not applicable" was 
assigned to items that respondents were instructed to 
skip. All five kinds of missing data were coded distinctly 
in the PIRLS database. 

Background data were not imputed for cases with miss- 
ing data. Item response rates for variables discussed in 
this report were over the NCES standard of 85 percent 
(weighted) to report without notation. 

Confidentiality and limitations 
disclosure 

The PIRLS data are hierarchical and include school, 
teacher, and student data from the participating schools. 
Confidentiality analyses for the United States were 
designed to provide reasonable assurance that public-use 
data files issued by the lEA would not allow the identi- 
fication of individual U.S. schools, students, or teachers 
when compared against public-use data collections. 
Disclosure limitation included identifying and masking 
potential disclosure risk to PIRLS schools and adding an 
additional measure of uncertainty to school and student 
identification through random swapping of data ele- 
ments within the student, teacher, and school files. 

Statistical procedures 

Tests of significance 

All comparisons discussed in this report have been 
tested for statistical significance using the t statistic. 
Statistical significance was determined by calculating a 
t value for the difference between a pair of means, or 
proportions, and comparing this value with published 
tables of values at a certain level of significance, called 
the alpha level. The alpha level is an a priori statement 
of the probability of inferring that a difference exists 
when, in fact, it does not. The alpha level used in this 
report is .05, based on a two-tailed test. 
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The calculation of the t statistic varied depending on 
the type of analysis. For comparisons between indepen- 
dent samples (e.g., an average score for U.S. students 
compared with an average score for students in another 
jurisdiction) or between the U.S. average and the interna- 
tional average,^ the t statistic was calculated as follows: 

^ (Pi-Pi) 

^(se] + sel) 

where and are the estimates to be compared and 
S6j and se^ are their corresponding standard errors. 

For all other comparisons, the t statistic was calcu- 
lated by running the jackknife regression procedure 
available in the International Database (IDB) Analyzer 
software provided by the lEA. Because of the clustered 
nature of the PIRLS sample (students within classrooms 
within schools), seemingly independent samples (e.g., 
boys and girls) may in fact be correlated. To estimate 
the standard error of the difference between groups 
in correlated samples, the jackknife regression calcu- 
lated the standard error of the difference between the 
groups being compared for each of the replicate PIRLS 
samples.^ The t statistic was calculated by dividing the 



^Because U.S. students contribute to the international average, 
the two samples are not entirely independent. When dependent 
samples are compared, it is most appropriate to use a different 
t-test formula that takes account of the overlap between the 
two samples. Tests of differences between the U.S. average 
and the international average could not be performed using 
dependent samples t-tests because the international data 
were unavailable during the time in which the U.S. data were 
analyzed. Consequently, the independent samples t statistic 
was used when comparing a jurisdiction average to the inter- 
national average. 

^See Martin et al. (2007) for details on the tests of statistical 
significance used for correlated samples. 



difference between the two estimates being compared 
by the average standard error of the difference between 
the two comparison groups. 

Effect Size 

Tests of statistical significance are, in part, influenced 
by sample sizes. To provide the reader with an increased 
understanding of the size of the significant difference 
between student populations in the United States, 
effect sizes for selected results are included in the 
report. Effect sizes use standard deviations, rather than 
standard errors, and are therefore not influenced by 
the size of the student population samples. Following 
Cohen (1988) and Rosnow and Rosenthal (1996), 
effect size is calculated by finding the difference 
between the means of two groups and dividing that 
result by the pooled standard deviation of the two 
groups. The formula used to compute effect size (d) 
is as follows: 

- eSt^r,2 

fooled 

est , and est _ are the student group estimates being 

grfl grf2 or o 

compared, sd , , is the pooled standard deviation of 
the groups being compared. The formula for the pooled 
standard deviation is as follows (Rosnow and Rosenthal 
1996): 

j _ + sd\ 

fooled - V 2 ■ 

sd^ and sd^ are the standard deviations of the groups 
being compared. 



'‘The IDB Analyzer software provided by lEA does not provide the 
variance or standard deviations of estimates. To calculate these 
statistics for effect sizes, the estimates for sex, race/ethnicity, 
school control, and school poverty level were re-run using the AM 
statistical software package. 
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